Recognition of digit strings in noisy speech with limited resources
نویسندگان
چکیده
Automatic recognition of continuously-spoken digits (e.g., telephone numbers or credit card numbers) is feasible with excellent accuracy, even for speaker-independent applications over telephone lines. However, even such relatively simple recognition tasks su er decreased performance in adverse conditions, such as signi cant background noise or fading on portable telephone channels. If an application further imposes signi cant limitations on the computing resources for the recognition task, then robust limited-resource speech recognition remains a suitable challenge, even for a vocabulary as simple as the digits. Since connected-digit recognition over telephone lines is a very practical application, the amount of computer resources needed for a given level of recognition accuracy was investigated for di erent acoustic noise conditions. Rather than use a traditional hidden Markov model approach with cepstral analysis, which is computationally intensive and does not always work well under adverse acoustic conditions, simpler spectral analysis was used, combined with a segmental approach. The limited nature of the vocabulary (i.e., 10 digits) allows this simpler approach. High recognition accuracy can be maintained despite a large decrease (vs. traditional methods) in both memory and computation.
منابع مشابه
Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملQuantile based histogram equalization for online applications
The noise robustness of automatic speech recognition systems can be increased by transforming the signal to make the cumulative density functions of the signal’s values in recognition match the ones that where estimated on the training data. This paper describes a real–time online algorithm to approximate the cumulative density functions, after Mel scaled filtering, using a small number of quan...
متن کاملThe Effects of Speech on Speech Recognition and Tex
Speech is a natural input/output modality for wireless access to information on the web. One way to overcome resource constraints on current wireless devices is to locate the speech recognition and text-to-speech systems on remote servers and transmit compressed speech between the server and wireless clients. To this end, we evaluated the effects of speech compression on the performance of a co...
متن کاملConnected Digit Recognition Experiments with the OGI Toolkit's Neural Network and HMM-Based Recognizers
This paper describes a series of experiments that compare different approaches to training a speakerindependent continuous-speech digit recognizer using the CSLU Toolkit. Comparisons are made between the Hidden Markov Model (HMM) and Neural Network (NN) approaches. In addition, a description of the CSLU Toolkit research environment is given. The CSLU Toolkit is a research and development softwa...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000